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We present a numerical method to identify regions of phase space that are approximately retained in a mobile 
compact neighbourhood over a finite time duration. Our approach is based on spatio-temporal clustering of 
trajectory data. The main advantages of the approach are the ability to produce useful results (i) when there 
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Coherent features in time-dependent dynamical 
systems are difficult to identify, and consider¬ 
able effort has been put into the development 
of identification algorithms. Most approaches re¬ 
quire knowledge of the dynamical system or high- 
resolution trajectory information, which in ap¬ 
plications may not be available. We present a 
trajectory-based method that is aimed squarely 
at the situation where the available information is 
poor: there are few trajectories, the available tra¬ 
jectories do not span the full time duration under 
consideration, and there are missing observations 
within trajectories. As our method is very simple 
to implement and fast to run, it also provides a 
rapid “first cut” coherent structure analysis even 
in situations where the full dynamical system or 
high-resolution trajectory data is available. 


I. INTRODUCTION 

There are a number of different concepts that describe 
the notion of coherent behaviour in time-dependent dy¬ 
namical systems. Probabilistic approaches define finite¬ 
time coherent as regions of phase space that 

minimally mix with the surrounding phase space during 
a specified time duration of finite length. Lagrangian co¬ 
herent structures can be defined as material lines that 
extremize a certain stretching or shearing quantit y 
while another approach tries to identify curves on which 
local dynamics approximates local rigid-body motion^. 
There are also topological^ and ergodicity-based^ de¬ 
scriptions of coherence, although these are not designed 
for aperiodic dynamics. Finally, a recent geometric 
characterisationii defines finite-time coherent sets as 
those sets with boundary to volume ratios that remain 
minimal under the evolution of the dynamics, and proves 
that such a characterisation arises naturally as the advec- 
tive limit of the probabilistic approache s 

In the present paper we develop cluster-based tech¬ 


niques to highlight distinct groups of trajectories that 
remain in compact, approximately spherical subregions 
of phase space over a finite time duration. Let ^ : 

X [0,T] ^ denote the flow of a continuous time 
dynamical system on i.e. <I>(x, t) denotes the state of 
the system at time t with initial value x (at time 0). We 
define a dynamic metric 

B{x,y):=[ (1) 

Jo 

based on some metric p on For example, if p is the 
Euclidean metric, then x and y are close according to D 
provided they remain close in a Euclidean sense averaged 
over the time interval [0,T]. 

In this general setup, one is free to choose p and also 
how the terms p($(x, t), <F(p, t)), t G [0,T] are combined 
to form D(x, y). Setting p to be the Euclidean metric is a 
natural choice if the trajectory data lies in and shortly 
we will give geometric reasons for why this is a good 
choice. The sum-of-squares combination is a convenient 
form for the specific numerical clustering approach pro¬ 
posed below. One could alternatively define, for exam- 
pie, iy{x,y) := ^ p{<^{x,t),^{y,t)Ydt, for 1 < p < oo, 
or D(a;, y) := inaxtg[o,T] p{^{x, t), 4>(y, t)). 

In practice, suppose we have n trajectories given at 
discrete times {0,1,...,T}, denoted Xi^t ^ = 

1,..., n, t = 0,..., T. We wish to cluster the initial 
points Xi^o G according to D. The discrete-time ver¬ 
sion of m is 

T 

D(xi,o,Xj,o) ='^p{xi^t,Xj,tf, (2) 

t=0 

for 1 < i,ji < n. At this point, one could calculate 
n{n — l)/2 interpoint distances 'D{xi^o,Xj^o), 1 < i < 
j <n. This general approach of clustering using the dy¬ 
namic metric o or ([2]) is very flexible and in principle 
one could employ any suitable (according to the three 
properties outlined below) clustering method on from 
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the vast number available (see e.g. Ref. [sl): centroid- 
based algorithms like k-means^i24 and fuzzy c-means^i^, 
spectral methods^i^, density-based clustering^i^, and 
methods based on community detection^ (for example, 
modularity^). 

The sum-of-squares form of D and choice of Euclidean 
metric for p allows us to further rewrite m as 

T 

= \\Xi - Xjf, (3) 

t=0 

for I < i^j < where Xi = (x^^o,• • •, Thus, 

we have a convenient representation of the dynamic met¬ 
ric D on as the squared Euclidean distance between 
trajectories in fl^o ~ 

As elaborated in the next section we use the fuzzy c- 
means clustering algorithm^i^ on Our reasons 

for using fuzzy c-means are threefold (but not necessarily 
exclusive to fuzzy c-means): 

1. When searching for K clusters^, fuzzy c-means 
produces K auxilliary “centres” and aims to allo¬ 
cate data to clusters by reducing the total squared 
distance from the data to their corresponding cen¬ 
tre. If p is the Euclidean metric, then fuzzy c- 
means will favour clusters that are close to spherical 
at each time instant. Such clusters therefore will 
not “spread out” in phase space, will remain in an 
approximately tubular region in lower-dimensional 
space-time (phase space plus one time coordinate, 
see Eigure[T]), and will on average have low bound¬ 
ary size to volume at each time instant. The low 
boundary size relative to volume property is com¬ 
patible with probabilistic approache o^^d^ and geo¬ 
metric approachesii to finite-time coherent sets. 

2. Euzzy c-means provides feedback in the form of the 
membership value describing the likelihood that a 
trajectory belongs to a cluster. Because finite-time 
coherent sets do not necessarily fully partition the 
phase space, we can identify non-coherent collec¬ 
tions of trajectories as those with a low membership 
for all clusters. 

3. Euzzy c-means is computationally efficient, partic¬ 
ularly for large numbers of trajectories. 

Einally, we note that once the initial points {:ri^o}i< 2 <n 
have been clustered, the full trajectories are also clus¬ 
tered, as by definition trajectories remain within the 
same cluster for alH = 0,..., T. 

Clustering trajectory data is a recent problem in the 
analysis of spatio-temporal datasets, with many contri¬ 
butions found in the data mining literature. We refer the 
reader to Ref. 1^ fo r a recent review and to the literature 
review in Ref. for a brief summary of the different 
approaches used for spatio-temporal clustering. Ref. [22| 
proposes an augmented fuzzy c-means algorithm, with 
different weights for the temporal and spatial compo¬ 
nents. Ref. [ 29 I and the thesis [28| introduce a distance 


measure essentially identical in form to ([T]) . Ref. [29| uses 
this metric with density-based algorithms^ to cluster tra¬ 
jectories in geo-referenced data sets and to find optimal 
time intervals for clustering. The papers [22|, [28|, and 
[ 29 I do not consider how to treat incomplete trajectory 
data. Other distance measures have been proposed to ac¬ 
count for application-specific purposes, e.g. for studying 
movement patterns in trafh o^^i^^ . However, to the best 
of our knowledge, spatio-temporal clustering approaches 
have not been employed for studying transport phenom¬ 
ena and coherent behaviour in time-dependent dynami¬ 
cal systems and an exploration of the tuning of clustering 
methods to this application has not been undertaken. 

Our main contributions are (i) posing the problem 
of identifying finite-time coherent sets as an objective 
trajectory-based clustering problem, (ii) developing a 
methodology for handling incomplete data that consis¬ 
tently uses all available data, and (hi) indicating some 
rules of thumb for applying these techniques in practice. 

An outline of the paper is as follows. We describe 
our approach first in the situation where there are a fi¬ 
nite number of trajectories available, sampled at a finite 
number of times. We then consider combinations of a 
continuum of trajectories and a continuum of observa¬ 
tion times. Section HD concludes by showing that our 
clustering framework for coherent sets is objective and 
independent of isotropic scaling of space and time. Our 
method handles missing trajectory data naturally and we 
discuss this in Section Hill A discussion of false positives, 
possibly inaccurate results, and how to identify these is 
in Section lEl we also outline some rules of thumb for 
parameter choices. Section [V] illustrates the approach 
for several examples: firstly in one-dimensional dynam¬ 
ics, where the geometry of the spatio-temporal clustering 
is more transparent, secondly for the well-known double¬ 
gyre flow and the transitory double gyre flow for compar¬ 
ison with existing coherent set identification approaches, 
and thirdly on ocean surface drifter data. We demon¬ 
strate how reasonable results can be achieved even when 
large percentages of trajectory observations are missing, 
and when the trajectory dataset is comprised of trajec¬ 
tories much shorter than the full time duration under 
analysis. 


II. FULL DATA CASE 

We first describe the case where all trajectories span 
the finite time duration and there are no missing observa¬ 
tions. We begin by describing our setup in the situation 
where there are a finite number of trajectories sampled 
at a finite collection of time instances, this means that 
all trajectories are sampled at all time instances; we then 
follow with versions that are continuous in space and/or 
time. 
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A. Discrete setting: a finite number of finitely-sampled 
trajectories. 

Suppose we have n trajectories of maximum length 
T -f- 1, denoted Xi^t ^ i = 1,..., n, t = 0,1,..., T. 
We consider each trajectory {^2,t}o<t<T as a point Xi = 
^ which we also refer to as 

a trajectory. We imagine as 11^=0 where 

the product of copies of our phase space is or¬ 
dered in increasing time t. The fuzzy c-means clustering 
algorithm^i^ is a soft clustering based on the calculation 
of a centre for each cluster and a likelihood of member¬ 
ship of each data point to each centre. Suppose we have 
identified K cluster centres Ck G k = 1,... ,K. 

We may decompose the Ck as {ck o,Ck ... ,Ck t) ^ 

^d(T+l)^ 

SO that each Ck^t ^ can be regarded as a 
point in phase space at time t. Associated with each 
trajectory Xi is a likelihood 0 < Uk^i < 1 of being 
associated with the cluster centre Ck. 

Given trajectories XiC = cluster centres 

C/c, k = and membership likelihoods Uk,iC = 

1,..., n, /c = 1,..., X, the total “goodness of fit” of the 
memberships of trajectories in clusters is measured by 
likelihood-weighted intracluster distances, which we wish 
to minimise: 


4. Evaluate objective ©■ If the improvement in the 
objective is below a threshold, go to step 5; other¬ 
wise go to step 2. 

5. Output cluster centres Ck G k = 1,..., K 

and membership likelihoods Uk^i G [0,1],/c = 

1,...,X, i = l,...,n. 

The update rule in step 2 is constructed by fixing the Uk^i 
and choosing Ck so that the gradient of the objective (j4)) 
is zero. Similarly, the update rule in step 3 is constructed 
by fixing the Ck and choosing the Uk^i^o that the gradient 
of the Lagrangian incorporating (|4|) and the constraint 

Yjk=i = 1 is zero. 

Implementation in MATLAB: If X is an n x d(T+1) 
array of n trajectories in (i.e. the rows of X are the 
vectors X^, i = l,...,n discussed above). Algorithm 1 
is implemented in MATLAB by the function fern in the 
Fuzzy Logic Toolbox: 

opts(l)=m; 

[c,u]=fcm(X,K,opts); 

If f cm is called without opts, the default value of m is 2. 
For d = 2, to display the membership values for cluster 
k at time slice t G {0,1,..., T}, one can use 


K n K n T 

E 11^* = - ck,t II". 

k=l i=l k=l i=l t=0 

(4) 

This minimisation is subject to the constraints that (i) 

I2k=i = 1 for i = l,...,n and (ii) Uk,i > 0 for 
all k = 1,..., X, i = 1,..., n. The parameter m > 1 
is the fuzziness exponent. Increasing m corresponds to 
softer clusters, while as m approaches 1, the member¬ 
ship likelihoods converge to either 0 or 1, resulting in 
a hard clustering^ (this latter effect is most easily seen 
from the update rule ([6j) below). The basic fuzzy c-means 
algorithm^i^ in our notation above proceeds as follows. 

Algorithm 1: 


1. Initialize membership values Uk^i either randomly 
or computed via step 3 based on an initial seeding 
of K centres (e.g. randomly or by the /c-means++ 
algorithui^) 

2. Calculate centres: 


Ck = 




( 5 ) 


k = l,...,K. 


3. Update membership values: 


l/||X,-Cfc|p/(- 

Ef=i (i/||V-Q||2/(™-i))’ 


/c = 1,..., X, i = 1,..., n. 


scatter(X(:,2*t+l),X(:,2*t+2),[],u(k,:),U fo J 

For d = 3, one can similarly use scatters. 

We remark that the centres Ck = (c/c,0 5 ..., Ck^r)^ 

k = 1,..., X, are generally not true trajectories of the 
dynamical system, although they may be remarkably 
close to true trajectories in some cases. For each k = 

1,..., X, one can identify the maximum likelihood tra¬ 
jectory Xi* for the cluster, where = wegmaxiUk^i. 
The trajectory Xi* is the most likely to belong to the 
k^^ cluster and may be thought of as a “probabilistic 
centre” of the cluster. The probabilistic centers can be 
interpreted as “low dimensional representations” of the 
macroscopic behavior of the system, as they describe the 
coherent motion of trajectories in the corresponding clus¬ 
ter. We illustrate both the centre and maximum likeli¬ 
hood trajectory in Figure [TJ 

Note that there is potential for one to include weights 
as coefficients for the terms \\xi^t — Ck,t\\‘^ in which 
could depend on or /c. A particularly important ex¬ 
ample is the inclusion of weights Qi > 0 corresponding 
to the “mass” assigned to a point Xi^o. For example, if 
one is searching for coherent regions in an oil or chemical 
spill in the ocean, one is likely interested in the behavior 
of the oil or chemical, rather than the water. In order to 
obtain clusters that focus on the nonuniform distribution 
of oil or chemical, one can replace with QiU^^ in (j4]) 
and (pT]) . 

Also note that at present, clustering into spheres 
is preferred by the Euclidean norm. If one wishes 
to favour clustering into ellipsoids, with orthogonal 
semi-axis vectors vi,... ,Vd and corresponding semi-axis 
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FIG. 1: Result of clustering of the double-gyre flow with 
n = 512, if = 2, m = 2, d = 2, T = 50 (corresponding 
to the real-time interval [0, 5] in time steps of 0.1). 
Trajectories Xi with membership values Uk^i > 0.9, 

/c = 1, 2 are shown. Note that in this figure we display 
space-time using time as a third coordinate, whereas the 
clustering computations in Algorithm 1 occur in 
One sees that the red and black trajectories remain in a 
compact region as time evolves. The probabilistic centre 
of the cluster corresponding to the red trajectories is 
shown as a meandering black curve, the light blue curve 
is the centre as computed by Algorithm 1. 


elements of as before, and we write an indi¬ 
vidual trajectory as X{xq) G Note that the 

likelihoods Ux^^k are also continuously parameterised by 
xq G a, and we write these in functional form as Uk(xo), 
so that Uk : A k = l,...,i^. In the finite- 

trajectory setting, the initial points of trajectories need 
not be uniformly distributed over the phase space, nor 
be given a uniform weight. If one wishes to model the 
evolution of a passive tracer field with nonuniform den¬ 
sity, one will either have a greater density of points in 
areas of high tracer density or apply weights to points 
with higher tracer concentration. To capture this ef¬ 
fect in the continuum setting, we need a density func¬ 
tion q : A ^ satisfying f^q(y) dy = 1. We interpret 
/b (cco) fraction of initial points that belong 

to an e-neighbourhood of xq. For example, if the initial 
xq are uniformly sampled over A, then q = l/vol(A). 

Equation (|1|) now reads 


W f Uk{xo)”^\\X{xo) -Cjklpg(xo) dxo 

k=i B 

K T . 

EE / Uk{xo)""\\x{xo,t) - Ck,t\fq{xo) dxo (7) 

u —1 +—n J -A 


k=lt=0 ' 


lengths ^1 ,..., then one may simply scale the Xi^t data 
in along each Vj by l/ij^ j = 1,..., d, use the Eu¬ 
clidean norm in the objective above, and then rescale the 
data along each Vj hy ij^ j = 1 ,... ,d. Other distance 
functions could be used to replace Euclidean distance, 
but in the absence of specific replacement motivations 
based on known properties of the underlying dynamical 
system. Euclidean distance represents a natural isotropic 
default distance metric. 

Sections ril BHII PI outline extensions of the above setup 
to situations where either one or both of the spatial data 
or temporal data are on a continuum. These construc¬ 
tions are mainly of a theoretical nature, but have been 
included to (i) demonstrate what the analogous objects 
are in a continuum setting if e.g. a full dynamical systems 
model were available, and (ii) indicate how the discrete 
“finite data” setting above is a special case (constructed 
by subsampling in space and/or time) of the continuum 
“full model” setting. Sections III Bflll Dl could be omitted 
on a first reading. 


B. Semi-continuous setting #1: a continuum of initial 
points, with trajectories finitely-sampled in time. 

Suppose we have a continuum of initial points in a 
set A C We now write trajectories as x{xoA) ^ 

]R^,xo G A,t = 0,1,...,T. Individual trajectories 
{x{xoA)}o<t<T for fixed xq e A are still regarded as 


Here is a simple example to help visualise what is go¬ 
ing on. Let phase space be [0,1], and consider trajec¬ 
tories of length two, generated by a map S : [0,1] O. 
Geometrically, we look for clusters in data of the form 
(x(xo, 0), x(xo, 1)): for all xq G [0,1], which is nothing 
but the (weighted, if q is not constant) graph of S con¬ 
sidered as a one-dimensional subset of [0,1]^; see Eig- 
ure[2](a). Eigure[2jb) shows clusters in data of the form 
(x(xo,0),x(xo,2)). 



EIG. 2: Graphs of the interval map S dm), which 
permutes the intervals [0,1/3], [1/3, 2/3], [2/3,1]. (a) x 
vs S{x)] (b) X vs S‘^(x). Gonsidered as a time series of 
length 2, Algorithm 1 would seek clusters in these 
graphs, considered as a subset of [0,1]^. 
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C. Semi-continuous setting #2: a finite number of 
continuously-sampled trajectories. 

Suppose now that instead of a continuum of finitely- 
sampled trajectories, we now have a finite collection of 
trajectories observed continuously in time: Xi{t),i = 
1,..., n, t G [0, T], with optional weights i = 1,..., n, 
representing the mass of the point ^^(O). We consider 
Xi : [0, T] i = 1,..., n as a finite number of con¬ 

tinuous mappings from [0, T] to Similarly, the cluster 
centres : [0, T] k = 1,..., if are a finite num¬ 

ber of (not necessarily continuous) mappings from [0,T] 
to The RHS of ^ becomes 


ax{xo, = ack{t),u'j^{xQ) = Uk{xo), and q'ix^) = 

q{x{))/a^. Then changing variables from xq to Xq and 
from t to t' we have 


m 


K r/ 3 T / n 

= E/ / 

/c=l ^ ol-A 

g(xo) dx'o) dt' 

= / (f Aixo)"'\W{xo,t')-c'^{t') 

fe=l Jo \Ja' 

q^Xg) dXg) dt'. 


K n 

uT,i \\xiit) - Ck{t)fqi dt. (8) 

Jo 

To visualise the integral, imagine we have a one¬ 
dimensional (time-dependent) flow in a phase space M, 
and Xi{t) is one trajectory from t = 0 to t = T. The 
integral (|8]) computes the total (g-weighted) squared dis¬ 
tance between the graphs of the functions Xi and Ck in 
[0,T] X M; or in other words, the squared distance 

ll^i ~ lli2([0,T])- 


Thus, switching to the primed coordinates will simply 
increase the objective © by a constant factor [3 over 
the original unprimed coordinates. Again, the cluster 
centres and likelihood functions that minimise are 
isotropically scaled and unchanged, respectively, under 
this isotropic scaling of space and/or time. In particular, 
the clustering algorithm does not care how space is scaled 
against time. 

F. Frame-independence 



D. Fully continuous setting: a continuum of initial points, 
continuously-sampled trajectories. 

Combining the constructions from the previous two 
paragraphs, we now have functions x : A x [0, T] ^ 

Our likelihood functions remain as \ A ^ M+, 
k = 1,... ,K. The RHS of (j4]) becomes 

UiL Uk{xo)'^\\x{xo,t) - Ck{t)\\'^q{xo) dxo^ dt. 

( 9 ) 

E. Isotropic scaling of space and time has no effect 

Given that our clustering is occurring in the product 
space formed from as many copies of our phase space as 
there are time instants, it is pertinent to consider the 
effect, if any, of isotropically scaling space and time. We 
show that in fact, there is no real effect caused by such 
scaling. 

In the fully discrete setting, by o if space were scaled 
isotropically by a factor a and time by a factor /3, then 
(HI) would simply increase by a‘^. Thus the cluster cen¬ 
tres and likelihood functions that minimise are simply 
isotropically scaled and unchanged, respectively, under 
this isotropic scaling of space and/or time. 

In the continuum setting, we again consider scaling 
space isotropically by a factor a and time by a factor 
yd. This amounts to defining new “primed” variables: 
Xg = axg^ t' = ydt. A' = aA^ T' = ydT, x'{xgA') = 


To check frame-independence of an algorithm, one ap¬ 
plies the algorithm to an original dataset, then subjects 
the dataset to a (possibly time-dependent) afhne trans¬ 
formation, where the linear part is orthogonal. If the 
algorithm applied to the transformed dataset yields the 
transformed output of the original dataset, then the al¬ 
gorithm is frame-independent; see Ref. [sl for details. 

We consider the situation where we have a finite collec¬ 
tion of finitely-sampled trajectories; the arguments pre¬ 
sented apply equally to the other situations discussed in 
Sections HmUTDl 

Proposition: Algorithm 1 is frame-independent. 

Proof: Let {xi^t}i<i<n,o<t<T be an original collection 
of trajectories. Apply Algorithm 1 to {xi^t} to obtain 
centres Ck = {ck,o, ■ ■ ■ ,Ck,T) S k = 1,...,K 

and likelihoods Uk^i^ /c = l,...,i^, i = l,...,n that min¬ 
imise Denote the transformed trajectories i/i^t •= 

OfXi^t + Ot, where Of is an orthogonal d x d matrix and 
Of G Form transformed centres ^ := OtCk^t + Ot. 
Notice that ® has the same value when evaluated with 
{ck,t} and {uk,i}, and with J and 

{uk,i}. This is because the transformation x \-A OfX + Ot 
is an isometry with respect to the Euclidean norm for 
each t = 0,1,..., T. Because {ck^t} and {uk,i} minimise 
(|1]) for the dataset {xi^t}^ one has {c^ and {uk,i} min¬ 
imise (|4]) for the dataset D 

If we use a non-standard inner product (•,•)' := x^Qx 
for some symmetric positive-definite d x d matrix Q to 
define a norm || • ||' on each phase space slice then an 
analogous proposition would hold under transformations 
X i-G Otx Ot provided O/ = Q~^OjQ = 
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III. TREATMENT OF MISSING DATA 


4. Evaluate the objective 


Missing data can be treated naturally in our spatio- 
temporal clustering framework. Taking the finitely sam¬ 
pled, finite trajectory setting of Section EAl by missing 
data^ we mean a trajectory {xi^t}o<t<T where the values 
Xi^t are available only on a strict subset of time instances 
Ti C {0, that is, only {xi^t}teTi is available. In 

terms of the abstract dynamic norm m, we handle this 
by leaving out those terms in the sum over t in equation 
(|3|) that correspond to times at which data is unavail¬ 
able. Thus, the treatment of missing data we propose is 
not specific to fuzzy clustering. In the fuzzy clustering 
framework, this corresponds to excluding those time in¬ 
stants t e for which Xi^t is unavailable from both the 
centre update and membership likelihood update rules. 
Thus, only data that is available at a particular time in¬ 
stant t is used to calculate cluster centre coordinates at 
that time t. 

To do this efficiently, we consider the known portion of 
trajectory i, namely {xi^t}teTi^ ^s a point in the lower¬ 
dimensional space for the purposes of computing 

Euclidean distances in the clustering algorithm. This 
projection to a lower-dimensional space is easily incor¬ 
porated into Algorithm 1. Eor i = 1,..., n, we define tt^ : 

Rd(T+l) ^ Rd(T+l) by = y = . . .,Xi,T), 

where 


Xi,t = 


Xi^t, if ^ e 71; 
0, ift ^ Ti. 


( 10 ) 


To exclude unavailable observations from centre up¬ 
dates, for each time instant t = 0,1,...,T, we define 
It = {i :t eTi} C {1,... ,n}, namely the indices of all 
trajectories with observations available at time t. 

Algorithm 2: Clustering with missing data 


K n 

(13) 

k=l 2=1 

If the improvement in the objective is below a 
threshold, go to step 5; otherwise go to step 2. 

5. Output cluster centres Ck G A: = 1 ,..., A and 
membership likelihoods Uk^i G [0,1], A: = 1,..., A, 
i = 1,..., n. 

Algorithm 2 is also frame-independent; the proof is iden¬ 
tical to the proof of frame-independence of Algorithm 1. 

We remark that Algorithm 2 will have a preference 
for clusters that each contain a similar total amount of 
data; for example, one cluster comprising 20 trajecto¬ 
ries of length ten and another comprising 40 trajectories 
of length five both contain the same amount of data. 
In some problems, one may wish Algorithm 2 to have a 
preference for clusters with similar numbers of trajecto¬ 
ries, irrespective of the amount of available data in each 
trajectory. To achieve this, one can replace u^- with 
u^^/\Ti\ in (ITT]) in Step 2 and (H^ in Step 4. The reason¬ 
ing behind this replacement is that with the factor l/|7i|, 
m computes the average weighted squared distances 
from centres (per trajectory), whereas without this fac¬ 
tor, the total squared distances from centres is computed. 
With this altered objective function, one constructs the 
correspondingly altered update rules dllli-dllD as out¬ 
lined below Algorithm 1. We tested Algorithm 2 with 
and without this factor in the examples in Section [Vl and 
found little difference; we report the results without this 
factor. 


IV. WHAT CAN GO WRONG? 


1. Initialize membership values Uk,i. 

2. Calculate centres: 


^k,t — 




( 11 ) 


k = 1,..., A, t = 1,..., T. Note that we take 
a convex combination over only those observations 
available at time t. 


3. Update membership values: 


( 12 ) 


k = 1,..., A, i = 1,..., n. Note that when com¬ 
puting Euclidean distances, we project onto only 
those temporal copies of phase space in which tra¬ 
jectory data for Xi is available. 


Before we begin to outline some guidelines to avoid 
potential pitfalls in sections IIV BfHV^ we introduce a 
quantity that (along with the likelihoods Uk^i) can be 
useful for assessing confidence in the clustering reported 
by Algorithms 1 or 2. 


A. Entropy and classification uncertainty 

Each trajectory G % has relative probabilities 

Uk,i G [0,1], of belonging to cluster C/c, k = l,...,iG, 
respectively. We can now define an overall measure of 
certainty of cluster assignment of trajectory i via the nor¬ 
malised entropy of the probability vector ..., UK,i]^ 
namely 


hi 


log 

logK 


(14) 


The quantity hi takes values between 0 and 1, with = 0 
representing certain classification of trajectory i to one 
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of the K clusters and hi = 1 representing complete un¬ 
certainty of classification of trajectory i to one of the K 
clusters, see also Ref. 0. 

A collection of trajectories that are retained in a com¬ 
pact region of phase space over the time duration should 
correspond to a single cluster in Each of these 

trajectories should therefore have a low value of hi. A 
spatial plot of the field hi over the phase space is 
therefore useful for identifying the strength with which 
trajectories belong to clusters. Finer, cluster-by-cluster 
spatial information can be obtained by producing K 
spatial plots of the likelihoods Uk^i separately for each 


B. False positives 

Algorithms 1 and 2 will always produce centres and 
clusters, even if the system under consideration has no 
features that could be considered to be coherent. Thus, 
there is the possibility of Algorithms 1 and 2 reporting 
false positives. There are some easy ways to inspect the 
reported clusters and check for false positives. If the 
phase space is in one, two, or three dimensions, then one 
can visually inspect the clusters at each time instant to 
check if the clusters do indeed mostly remain in sepa¬ 
rate compact regions. This can be done by plotting Uk^i 
against Xi^t for k = 1,..., if and t = 0,1,..., T (us¬ 
ing e.g. the scatter command in MATLAB) to check 
the certainty of classification for individual clusters. If 
the phase space is not low-dimensional, one can plot Uk^i 
against i (or hi against i) and inspect how many tra¬ 
jectories have high confidence of classification. A low 
classification confidence is indicative of the cluster not 
corresponding to a coherent set. 


C. Choice of trajectory output times and choice of m 

Clustering with respect to the Euclidean metric be¬ 
comes less meaningful in high dimensions, with the dis¬ 
tribution of interpoint distances becoming increasingly 
tight. This can be partly mitigated by using an norm 
rather than the Euclidean ^2 norm, but we have found the 
following rules of thumb very helpful, and have achieved 
good results with the standard Euclidean norm. 

Firstly, one should choose the time between Xi^i and 
to represent some nontrivial dynamics. If the in¬ 
crement t ^ t 1 is too short, the dynamics is close to 
the identity transformation, and one adds d dimensions 
to the clustering problem (making it more difficult) for 
no information gain. On the other hand, the increment 
from t ^ should not be so long that the underlying 
dynamics appears random over one time step; a group of 
nearby points at time t should remain in a “connected” 
region at time t -h 1, even though this region may be 
stretched and folded. Secondly, the total time duration 
T should not be so long that the entire phase space is 


thoroughly mixed; for such T there is no chance of find¬ 
ing coherent sets. Once the step t ^ t + 1 and total 
duration T have been selected as above, one should ob¬ 
tain reasonable results. Finally, to fine tune the value of 
m to ensure robust results, we suggest the following rule. 
Begin with m = 2 and decrease m. For each value of 
m, record the locations of 5 the maximum likelihood 
trajectories at time t = 0 (the choice of t = 0 is arbi¬ 
trary). Find a range of m for which the locations of the 
maximum likelihood trajectories are stable (i.e. approx¬ 
imately fixed). Note that the centres at time t = 0 
will tend to continue to vary with m so they are not good 
indicators of cluster stability with m. 


D. Centre collapse 

If two or more of the reported cluster centres are all 
very close to one another in space, there are at least 
three possibilities. Firstly, it could be that there are no 
coherent structures in the trajectory data. Secondly, it 
could be that the choice of the step t ^ t-\-l and/or T are 
unsuitable. Thirdly, even if the choice of the step t ^ 
and/or T are reasonable, it could be that the value of m 
is too high. In our experiments we have found that the 
larger d(T 1) is (the larger the total dimension), the 
smaller m needs to be to avoid centre collapse. This is not 
surprising because with higher dimension, the interpoint 
distances distribution is more tight, and a lower value of 
m emphasises differences in distance more. This is the 
reason behind our suggestion in the previous paragraph 
to start with m = 2 and decrease m until the maximum 
likelihood trajectories are stable. 


E. Other inaccurate results 

For systems that do contain finite-time coherent sets, 
there are some points to bear in mind to increase the 
accuracy of the reported clusters. If a finite-time co¬ 
herent set is small relative to the domain size and few 
clusters are sought, because Algorithms 1 and 2 favour 
clusters containing approximately the same number of 
trajectories, the clusters may be much larger than the 
true coherent region. In such a situation, an inspection 
of the likelihood functions may reveal the small coherent 
regions as “high likelihood”. On the other hand, if there 
are few, large coherent sets, but one chooses a large value 
of K, then the coherent regions will likely be subdivided 
into several clusters. 

These effects can be studied by varying the number 
of clusters K (which is cheap to experiment with). For 
each K one can visually inspect the clustering confidence 
according to Uk^i and hi, as discussed in Section HVBi If 
a regime of cluster stability can be found for a number 
of consecutive K, this gives some confidence to the re¬ 
sults. Finally, if sufficient data is available, the results 
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can be checked against the classical finite-time coherent 
set identification method o^^d'^d^ . 

V. NUMERICAL EXPERIMENTS 
A. One-dimensional examples 



We start with three one-dimensional maps on 5'^, 
which we think of as the unit interval [ 0 , 1 ] with the end¬ 
points identified. Because we are on and not [0,1], 
the distance computation and the center updating are 
modified in the obvious way. The first example is given 
by 


{ 3x (mod |) + 1/3, X < 

3x — ^ (mod b) -j- 2/3, | < x < |, (15) 

3x — I (mod b)^ ^ 

The map S cyclically permutes the three intervals [0, -1), 
[b, |) and [|, 1) and mixes each interval internally. Thus 
the graph of S features three equally sized blocks that 
are cyclically permuted, see Figure O 

To test Algorithm 1 we select 1000 random initial con¬ 
ditions from [ 0 , 1 ] and iterate them nine times by the 
mapping S. We want to find clusters in 1000 data points 
in We choose K = 3 and a very small fuzziness 

parameter of m = 1.1. The membership functions of the 
three clusters are shown in Figure [3] (a). As expected, the 
three coherent sets obtained are comprised of the three 
intervals; the evolution of these intervals is visualized in 
Figure m The cluster centers are the centers of the inter¬ 
vals and the Uk,i describe a very sharp trajectory-cluster 
membership. Increasing the fuzziness to m = 2 gives a 
fuzzier result, but still has clear clusters; see Figure[3](b). 


0.9 
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FIG. 3: Membership functions for the three clusters of 
the one-dimensional map ([IS]) plotted against the initial 
conditions of the trajectories under consideration. 

(a) m = 1 . 1 , (b) m = 2 . 


If we reduce the number of desired clusters to A" = 2 
the algorithm will either merge the first two or the sec¬ 
ond two clusters, depending on how the initial conditions 
are distributed. Trying to approximate K = 4 coherent 
sets, one of the three clusters is divided into two clusters. 
Their centers are almost coinciding, an indication of false 


FIG. 4: Glustering of the Id map ([TS]) : time evolution 
of the three clusters, one of the cluster centers as 
computed by Algorithm 1 in red. 


positives, and the membership functions on this interval 
are very much fiuctuating, see Figure [5l 



0.4 0.6 


FIG. 5: Seeking four clusters of trajectories from m 
results in the partition of one of the clusters in two 
clusters with centers almost coinciding and highly 
fiuctuating membership functions (m = 1 . 1 ). 

The map S in m has perfectly coherent sets: there 
is no transport between the three coherent sets. We now 
briefiy consider two further one-dimensional systems on 
to demonstrate the more common setting of leaking 
coherent sets. The first system, which will be referred to 
as (FLQIO), is a repeated cycle of three maps Ti,T 2 ,T 3 , 
introduced in Ref. [ij. It was shown in Ref. [ij that there 
are two coherent sets of different sizes that are cyclically 
permuted. Details of the model can be found in Ref. [Ij 
(proof of Thm. 5.1 and Figure 1). Choosing again 1000 
random initial conditions from [ 0 , 1 ] and nine iterates of 
the maps (three cycles of T 3 0 T 2 oTi), we seek to find two 
clusters in the ten-dimensional data. For this we choose 
a fuzziness constant of m = 1.5. In Figure [ 6 ] we show 
the two clusters in space-time, plotting only those points 
with a membership value of at least 95% (according to 
the Uk,i) of belonging to one of the clusters. As expected, 
the cluster centers approximately cycle with period 3. 
We note that the two clusters at t = 0 (and thus at 
t = 3, 6 , 9) are consistent with the coherent sets obtained 
in Ref. [Tsf (see in particular Figure 2 in Ref. in where the 
supports of the positive/negative parts of the eigenvector 
shown there are in good agreement with the two clusters 
at t = 0 ). 

A more general situation has been discussed in Ref. M 
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FIG. 6: Clustering of 1000 trajectories of length 10 of 
the system (FLQIO) in space-time (m = 1.5). Plotted 
are only those points with Uk,i > 0.95. The solid and 
dashed lines indicate the centers of the two clusters as 
computed by Algorithm 1. 
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(a) (b) 

FIG. 8: Membership functions for the two clusters of 
the Id map (FLSIO) at time t = 2. (a) m = 1.5, (b) 
m = 2. 


Two coherent sets were extracted that move in an ape¬ 
riodic manner; see Example 1 in Ref. 0 for more de¬ 
tails of the underlying model, which we will refer to as 
(FLSIO) in the following. We can reproduce the coher¬ 
ent behaviour of (FLSIO) using the same setting as de¬ 
scribed above. Figure 0 shows the two clusters in space- 
time, again plotting only those points with a membership 
value of at least 95% for one of the two clusters. The 
results are consistent to those in Ref. [3 (see in particu¬ 
lar Figure 8 in Ref. [l^, where the supports of the posi¬ 
tive/negative parts of the Oseledets functions shown for 
iterates /c = 0,..., 5 are in good agreement with the two 
clusters at times t = 0,..., 5). The membership func¬ 
tions of the two clusters (plotted for time t = 2) for the 
choice m = 1.5 and m = 2 are shown in Figure [8] (one can 
also compare the form of the black membership function 
with the Oseledets function in Figure 8, Ref. [l3 for k=2). 
As anticipated, the clusters are not as clear-cut as in 
Figure [3l Eventually all trajectories will spread out over 
[0,1], so that spherical compact structures as detected by 
our approach cease to exist. We remark that in each of 
the one-dimensional examples, the maps have a uniform 
slope of 3, so that after the ninth iterate, nearby initial 
points have been separated by a factor of 3^ = 19683. 



EIG. 7: Glustering of 1000 trajectories of the system 
(ELS 10) in extended space {m = 1.5). Plotted are only 
those points with Uk^ > 0.95. The solid and dashed 
lines indicate the probabilistic centers of the two 
clusters. 


B. Double gyre flow 


We consider the time-dependent system of differential 
equations^ 


X = —7rAsin(7r/(x, t)) cos(7r^) 

df 

y = TTAcos{TTf{x,t))sm{Try)-X{x,t), 


(16) 


where f{x, t) = 6 sin(wt)a;^ + {1 — 26 sm{uit))x. 

Eor detailed discussions of the system we refer to 
Refs. [HI, [13, and [ 3 ^. As in Refs.lTBI and 0 we fix param¬ 
eter values A = 0.25, 6 = 0.25 and uo = 27 t and obtain 
a 1-periodic flow. In order to be able to compare our 
results with those in Ref. 0, where we have extracted 
two optimally coherent sets via transfer operator-based 
methods, we choose 2^^ initial points on a uniform grid 
on the invariant set [0,2] x [0,1]. Eor each of these ini¬ 
tial conditions we compute a trajectory on [0,r], where 
r = 1, 5,10. We output the trajectory data in increments 
of 0.1 time steps. Thus, for r = 1 each trajectory is rep¬ 
resented by a 22-dimensional vector (= (10-1-1) x 2), for 
r = 5 and r = 10 the corresponding vectors have length 
102 and 202, respectively. 

We start by extracting two clusters from the short tra¬ 
jectories (r = 1). The upper panel of Eigure [9] displays 
the membership values ui^i (note that U 2 ^i = 1 — ui^i) 
with respect to the initial conditions in the two dimen¬ 
sional phase space. To study the influence of the fuzzi¬ 
ness exponents on the results we choose m = 1.5 (Eigure 
[9l[a)) and m = 2 (Eigure [9][b)). Both plots give a clear 
indication of the two coherent sets. To get a more de¬ 
tailed picture about the certainty of cluster membership 
we compute the entropy h from (HU). The respective re¬ 
sults are shown in the lower panel of Eigure [H Eor the 
smaller fuzziness exponent m = 1.5 (Eigure [9l[c)) there 
are large regions of high certainty to belong to one of the 
two clusters, with some high uncertainty in the vicinity 
of the stable manifold of the hyperbolic periodic orbit 
on the x-axis. This uncertainty region increases signifi¬ 
cantly, when m = 2 (Eigure [9l[d)) is used. Here only the 
two regular regions (corresponding to invariant tori in 
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the time-1 flow map) are highlighted as the most certain 
regions. 



(c) (d) 


FIG. 9: 2-clustering of 2^^ trajectories in the double 
gyre flow m with flow time r = 1 for m = 1.5 (left) 
and m = 2 (right). (a),(b) membership values ui^i for 
m = 1.5 and m = 2, respectively. (c),(d) corresponding 
entropy plots using da. 

Note that a maximum likelihood hard partition into 
two sets gives the same result for both m = 1.5 and 
m = 2. The result is shown in Figure [TT] (a), with parts 
of the stable manifold of the hyperbolic periodic orbit on 
the x-axis superimposed. This known dominant (infinite¬ 
time) transport barrier determines a large part of the 
boundary between the two extracted coherent sets. This 
compares very well to the observations made in Ref. [13 
(see e.g. Figure 9.3 therein). 

We now consider longer trajectories with flow times 
r = 5 and r = 10. The respective results for m = 2 
are shown in Figure [TOl (a) and (b). As expected from 
what we have seen in Ref. [l3 the clustering of the initial 
conditions is again very much influenced by the stable 
manifold, see also Figure [TT] (b) and (c) for the respec¬ 
tive maximum likelihood partitions into two sets. This 
transport barrier also determines the regions of highest 
membership uncertainties, which is clearly visible in Fig¬ 
ure M ( c) and (d). We note that these entropy plots 
have striking similarity to the finite-time entropy fields 
obtained directly from the transfer operator—. 

A visualization of the clusters in space-time for flow 
time r = 5 is presented in Figure [TJ where from 512 
initial conditions we have plotted those trajectories for 
which the membership values Uk,i > 0.9 (m = 2). 

So far we have used high-resolution and complete tra¬ 
jectory data. We now test our approach in the situa¬ 
tion where the available information is poor. We use 
512 initial conditions on a regular grid on [0,2] x [0,1] 
and compute trajectories for flow time r = 5. We then 
destroy about 80% of the trajectory information by ran¬ 
domly setting the particle positions to NaN. This mimicks 
the situation that trajectories may not exist for the whole 
time span under consideration and additionally may have 
gaps in observation. Algorithm 2 produces two clusters 



(c) (d) 


FIG. 10: 2-clustering of 2^^ trajectories in the double 
gyre flow (fTOj) for m = 2 and flow times r = 5 (left) and 
r = 10 (right). (a),(b) membership values Ui^i for r = 5 
and r = 10, respectively. (c),(d) corresponding entropy 
plots using m- 



(b) (c) 


FIG. 11: Extraction of two clusters from 2^^ 
trajectories (m = 2) for different flow times in (pTj) 
based on maximum likelihood of the membership values 
as in Figures [9l and [TOl The dominant transport barrier 
is superimposed and bounds the two sets increasingly 
closely as r increases, as demonstrated in Ref. IB (a) 
flow time r = 1; (b) r = 5; (c) r = 10. 


from this highly incomplete trajectory data, as shown in 
Figure [T2j Note that even with this severe data thinning. 
Algorithm 2 still classifies the remaining data points to 
the correct sides of the transport barriers. 

Finally, we test what happens if we set if > 2. We 
restrict again to flow time r = 5 and 2^^ trajectories and 
choose m = 2. If K = 3 then compared to A" = 2 ei¬ 
ther the left or right cluster is subdivided as seen in the 
membership values in Figure [13] (a-c). The shapes of the 
resulting clusters in Figure [T3]( a,b) do not have any simi¬ 
larity with known coherent structures for this system, but 
apparently the respective trajectory bundles stay coher¬ 
ent in our sense - with the cluster centers well separated. 
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FIG. 12: 2-clustering of double gyre flow (p!6|) with 80% 
of the trajectory data missing, based on 2^ trajectories 
with flow time r = 5 (m = 2). (a) clusters at initial 
time, (b) clusters at final time. Corresponding 
transport barriers are superimposed. 


However, Figure[T3](c) reveals that the left cluster, which 
is also present in the 2-clustering considered in Figure [iQl 
is characterized by much higher membership function val¬ 
ues compared to the other two clusters. We note that for 
K = 4 we get a similar picture with the former two clus¬ 
ters both divided into two parts, and for K = b one of 
the former two clusters is divided into two and the other 
into three parts. 
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FIG. 13: 3-clustering of 2^^ trajectories in the double 
gyre flow for flow time r = 5 and m = 2. (a-c) 
membership functions for the three clusters. 


C. Transitory double gyre flow 


We consider the transitory dynamical systemp^ 


5 T . d ^ 
x = -—y = 

oy ox 


(17) 


and transition function 


i O, t < 0, 

^2(3 - 2t), 0 < t < 1, 

1, t>l. 

The nonautonomous dynamics, which rotates a double 
gyre pattern counter-clockwise by 90 degrees, is restricted 
to the time-interval [0,1]. In Ref. [i3 we have analysed 
this system using the transfer operator based coherent 
set framework. The unit square [0,1]^ is invariant under 
the flow and we choose 2^^ initial conditions on a regular 
grid. We consider the flow on the transition interval [0,1] 
and output the trajectory data in increments of 0.1 time 
steps. So T = 10 and thus we represent every trajectory 
as a 22-dimensional vector. 

Algorithm 1 with K = 2 and m = 1.5 returns Uk^i that 
take high values on the coherent sets observed in Ref. E 
as shown in Figure [TH 



(a) (b) 


FIG. 14: 2-clustering of 2^^ trajectories in the transitory 
double gyre flow dm) for the time interval [0,1]. (a),(b) 
membership values Uk^i^ k = 1,2 for m = 1.5. 

A visualization of the two clusters in space-time is pre¬ 
sented in Figure [HI where from 1024 initial conditions 
we have plotted those trajectories for which Uk,i > 0.95 
(m = 1.5) together with the probabilistic cluster centers. 

We study the influence of using information along a 
trajectory instead of only considering the initial and final 
points of a trajectory as many other identification algo¬ 
rithms do (i.e. taking 11 vs 2 time instances on [0,1]). 
The results of clustering 2^^ trajectories based only on 
the initial and end points of the trajectories are shown 
in Figure (ini The clusters are less smooth; an intuitive 
explanation for this is that Algorithm 1 only uses point 
information, as opposed to probability flow information 
as in Refs. [Til, [13, and IT^. Algorithm 1 needs to compen¬ 
sate for this by augmenting the point information with 
additional points over time. 


with stream function 

T(x, y, t) = (1 - s{t))4/p -h s{t)4/F 
Tp(x,^) = sin(27rx) sin(7r^) 
Tp(x,^) = sin(7rx) sin(27r^) 


D. Drifter data 

We demonstrate the efficacy of our approach 
on real-world data, namely drifter data from 
the Global Ocean Drifter Program available at 
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FIG. 15: Space-time plot of the two clusters obtained 
from 2^^ trajectories in the transitory double gyre flow. 

Only trajectories with Uk,i > 0.95 are shown, where 
m = 1.5. The solid curves indicate the position of the 
probabilistic cluster centers. 



(a) (b) 

FIG. 16: Membership functions for the 2-clustering of 
transitory double gyre flow - using only initial and end 
points of the trajectories (m = 1.5). 


AOML/NOAA Drifter Data Assembly Genter 
(http://www.aoml.noaa.gov/envids/gld/). The 

entire dataset spans the years 1979-2014, with drifter 
positions given every six hours. The area of observation 
is the global ocean (latitude [90, —78] and longitude 
[—180,180]). We focus on the years 2005-2009 and 
restrict to those drifters that have a minimum lifetime 
of one year within this five-year time span. We output 
the position of these 2267 trajectories (in longitude, 
latitude coordinates) every month, i.e. the length of our 
trajectories is 60 months. 

We note that a typical drifter does not operate over the 
whole five years; that is, many terminate prior to Decem¬ 
ber 2009 and many begin later than January 2005. There 
are also gaps in observations when there is a failure in 
recording the drifter location, so the data is highly incom¬ 
plete. Figure [T7I summarises two statistics: the distribu¬ 
tion of drifter lifetimes and the number of drifters actively 
recording each month. The average lifetime of a drifter 
in this data set is about 23 months, with many drifters 
operating only for a year and only very few drifters for 
4-5 years, see Figure [TTI (a). On average, 869 trajecto¬ 
ries (or 38% of all drifters in the period 2005-2009) are 



FIG. 17: Drifter statistics, (a) histogram of the drifter 
lifetimes, (b) number of drifters available at a certain 
time instance. 


available at a given time instant, with less data at the 
beginning and the end of the considered five year time 
span; see Figure [T7| (b). 

As we consider the global ocean we have to respect dis¬ 
tances on a sphere (we assume the surface of the ocean 
to be approximately spherical). We also have to en¬ 
sure that we restrict cluster centers to the surface of this 
sphere. To achieve both of these requirements we use 
a cosine distance function, and update centers only on 
the sphere^. Every drifter trajectory is represented as a 
vector in 3 X 60 = 180-dimensional space. In contrast to 
our calculations in Algorithm 2, we simply display our 
results in cartesian longitude-latitude coordinates. 

We first look for two clusters; Figure [18] shows results of 
the clustering algorithm for K = 2. Figure [18] (a) shows 
all drifter positions available on January 2005, coloured 
according to their maximum likelihood membership in 
one of the two clusters. Figure [18] (b) shows all drifter 
positions available on December 2009, again coloured ac¬ 
cording to their most likely cluster membership. Thus, 
we expect the red (resp. green) cluster in Figure [18] (a) 
to evolve coherently to the red cluster in Figure [18] (b). 
Of course, many of the drifters in Figure [18] (a) do not 
correspond to the same physical drifter in Figure [18] (b) 
because the lifetimes of many drifters are shorter than 
five years. Nevertheless, as physical drifters enter and 
leave the dataset over the five-year duration, the drifters 
tagged red (resp. green) move as a coherent cloud. This 
is illustrated in a video attached to our electronic sub¬ 
mission, see Figure [18] (Multimedia view). 

In Figure [m one sees a separation of the Pacific Ocean 
(red) from the Atlantic and Indian Oceans (green), which 
are grouped together. Here, continental obstructions 
play an obvious role in the dynamical separation of the 
ocean surface flow. Figure [18] (a) ascribes the southern 
part of the Indian Ocean to the Pacific Ocean. This is 
in line with recent researcbi^ (see Figure 6 in Ref. in 
based on transfer operator analysis of the Ocean Gen¬ 
eral Girculation Model for the Earth Simulator (OEES 
model)^i^, and consistent with a general eastward flow 
of water in high southern latitudes. One observes that 
the red drifters in Eigure[T8](a) have flowed eastwards to 
rejoin the Pacific in Eigure[T8] (b). 
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FIG. 18: 2-clustering of drifter data {m = 1.5). 

(a) drifter positions January 2005, (b) drifter positions 
December 2009. Animation of the cluster motion 
2005-2009 in online version (Multimedia view). 


Figure [19] shows the results of Algorithm 2 with K = 5 
at January 2005 (a), July 2007 (b), and December 2009 
(c). An animation is available online (Figure [19] (Mul¬ 
timedia view)). We choose K = 5 in order to attempt 
to delineate the five major oceans: the North and South 
Atlantic Oceans, the North and South Pacific Oceans, 
and the Indian Ocean. Broadly, we see that the cluster¬ 
ing does find the appropriate equatorial separations of 
the Atlantic and Pacific Oceans, and also separates the 
Indian Ocean. 

Some of these separations are highlighted by inves¬ 
tigating the certainty of membership of the individual 
drifters based on an entropy calculation da. In Figure 
[20] those drifters (positions as of July 2007) are marked 
black when their relative entropy is > 0.1, corresponding 
to a maximum membership value of less than ^ 0.96. 
Figure [20] and in particular the time evolution of the 
drifters (Figure [20] (Multimedia view)) shows that the 
uncertain regions correspond to the major ocean barriers 
in the Atlantic and Pacific, and the Southern Ocean. 

Our results in Figure [19] are strikingly similar to those 
shown in Figure 6 in Ref. [Ill, which have been derived 
using transfer operator methods and the (wind-forced) 
OFFS model. For example in Figure [19] (a), when com¬ 
paring with Figure 6 in Ref. Ei we see: the separation 
of the Pacific Ocean becoming more southerly as one 





FIG. 19: 5-clustering of drifter data {m = 1.5). 

(a) drifter positions January 2005, (b) drifter positions 
July 2007, (c) drifter positions December 2009. 
Animation of the cluster motion 2005-2009 in online 
version (Multimedia view). 


proceeds westwards toward Australia; the Indian Ocean 
spilling westwards at its southerly boundary; and the 
South Atlantic forcing its way around the east coast of 
southern Africa. As described in Ref. ES the Ekman dy¬ 
namics of the ocean surface circulation guarantees that 
each of the five major oceans contains an attracting re¬ 
gion associated to the great oceanic gyres and their corre¬ 
sponding garbage patches. The s^arations seen in Fig¬ 
ure [ini(a) and Figure 6 in Ref. Il9l and the features de¬ 
scribed above are associated with the basins of attraction 
of these five attracting regions. 

We remark that while the results in Ref. [H are of a 




















14 



FIG. 20: Difficult to classify drifters (positions as of 
July 2007) are marked black when their relative entropy 
is h > 0.1. Animation of the cluster motion 2005-2009 
in online version (Multimedia view). 


higher spatial resolution than those obtained here, the 
experiments in Ref. [Tol used just over 10^ trajectories, 
recorded every eight weeks for a period of 48 weeks (a 
total of 6.14 X 10^ data points), while here we have 
869 X 60 = 5.2 x 10^ data points, comprised of 2267 
incomplete trajectories. We also remark that while we 
have drawn comparisons between Ref. [H and the present 
study, the former computed ocean boundaries as basins 
of attraction based on a repeating 48-week ocean circula¬ 
tion, while our present study seeks to compute estimates 
of coherent sets based on five years of non-repeating 
drifter data. 


VI. DISCUSSION 

We have introduced a “rough-and-ready” general 
cluster-based approach for analysing coherent structures 
in time-dependent dynamical systems. Our method as¬ 
signs individual trajectories membership in regions that 
retain a compact extent over a specified finite time du¬ 
ration. Our approach has several advantages. 

First, the ability to work directly with a small number 
of trajectories, including the situations where the trajec¬ 
tories do not span the entire time duration of interest 
and where observations may be missing from within tra¬ 
jectories. Second, initial implementation is rapid (using 
e.g. the built-in MATLAB function fern to perform the 
fuzzy clustering for the case of complete data), and the 
runtimes are fast (on the order of fractions of seconds for 
the one-dimensional maps in Section IV Al to less than 10 
seconds to cluster a dataset of 32768 trajectories in 202 
dimensions, as in the case of the double gyre flow with 
flow duration r = 10 in Section IV Bp . Third, our method 
considers entire trajectories (not just the endpoints) and 
automatically outputs clusters at every time instant in 
the trajectory data; thus a frame-by-frame description of 
the temporal evolution of the clusters is immediately ob¬ 
tained. Fourth, the use of fuzzy clustering provides feed¬ 
back in the form of membership likelihoods and entropy. 


which provide the user with an estimate of confidence 
with which a trajectory has been assigned to a particular 
compact region. Finally, the soft clustering approach is 
relatively insensitive to noise in the data. We note that 
the same methodology can be used to estimate coherent 
regions for SDEs, by simply generating stochastic trajec¬ 
tories and applying Algorithm 1. 
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